Common and rare variant associations with latent traits underlying depression, bipolar disorder, and schizophrenia

Genetic studies in psychiatry have primarily focused on the effects of common genetic variants, but few have investigated the role of rare genetic variants, particularly for major depression. In order to explore the role of rare variants in the gap between estimates of single nucleotide polymorphism (SNP) heritability and twin study heritability, we examined the contribution of common and rare genetic variants to latent traits underlying psychiatric disorders using high-quality imputed genotype data from the UK Biobank. Using a pre-registered analysis, we used items from the UK Biobank Mental Health Questionnaire relevant to three psychiatric disorders: major depression (N = 134,463), bipolar disorder (N = 117,376) and schizophrenia (N = 130,013) and identified a general hierarchical factor for each that described participants’ responses. We calculated participants’ scores on these latent traits and conducted single-variant genetic association testing (MAF > 0.05%), gene-based burden testing and pathway association testing associations with these latent traits. We tested for enrichment of rare variants (MAF 0.05–1%) in genes that had been previously identified by common variant genome-wide association studies, and genes previously associated with Mendelian disorders having relevant symptoms. We found moderate genetic correlations between the latent traits in our study and case–control phenotypes in previous genome-wide association studies, and identified one common genetic variant (rs72657988, minor allele frequency = 8.23%, p = 1.01 × 10−9) associated with the general factor of schizophrenia, but no other single variants, genes or pathways passed significance thresholds in this analysis, and we did not find enrichment in previously identified genes.


Gene-based association testing
We searched the literature for genome-wide or exome-wide association studies testing rare variants in gene-based tests for schizophrenia and bipolar disorder. Some studies of exome-sequencing data have identified genes associated with schizophrenia and bipolar disorder using collapsing tests, such as burden tests, including only those rare variants in each gene that are predicted to be damaging. For example, the SETD1A gene was significantly associated with schizophrenia at the whole-exome level in a meta analysis of cohorts involving European ancestries 7 , and the RBM12 gene was significantly associated with psychosis at the whole-genome level in an Icelandic sample and replicated in a Finnish sample 14 . In addition, SLC6A1 was recently found to be significantly associated with de novo mutations in a multinational schizophrenia cohort study 15 .
For gene-based burden tests, the PAGEANT power calculator 16 was used to estimate power in three different scenarios: 1) where minor allele frequency is independent of expected variance explained, 2) where minor allele frequency is independent of genetic effects measured in terms of unit per copy of an allele, and 3) where minor allele frequency is negatively correlated with genetic effect (see: https://andrewhaoyu.shinyapps.io/PAGEANT/).
The required alpha was set to 0.0001 and power was calculated for a range of values for expected variance explained (from 0.05% to 0.5%), with an expected 20% of variants in each gene being causal variants.
This indicated that there would be 0.999 power when the expected variance explained by a gene was above 0.16%, in all three scenarios, and that there would be >0.95 power when the expected variance explained by a gene was above 0.1% in all three scenarios.

Enrichment analysis
Enrichment analysis is used in three analyses in this study. First, it is used in hypothesis-free testing, to identify pathways that are associated with the latent psychiatric traits. Secondly, it is used to test whether genes associated with matched Mendelian disorders are enriched for genetic associations for latent psychiatric traits. Thirdly, it is used to test whether genes previously associated with schizophrenia and bipolar disorder through common variant tests (GWAS) will also be enriched for rare genetic variants associated with latent psychiatric traits underlying the psychiatric disorders.
We use competitive gene set analysis in MAGMA 17 to test whether the genes in our gene sets are more associated with these traits than all other genes outside the gene set. The competitive gene set analysis controls for confounding features of genes, such as gene length and linkage disequilibrium.
It should be noted that the authors of MAGMA remark that the number of genes in gene set enrichment analysis is related to the statistical power of these analyses. That is, when there is a larger number of genes in gene-set enrichment analyses, the statistical power to detect associations is higher. However, there is no publicly available or standard method to estimate power for enrichment analyses in MAGMA.

Pre-registration details and deviations
Methods were piloted with the analysis for depression, and pre-registered for the analyses for schizophrenia and bipolar disorder (available through the OSF: https://osf.io/w8jyu/).
We made three deviations from the pre-registration analysis:

Search terms used for OMIM database search for matched Mendelian phenotypes
In our pre-registered methods, we planned to search for Mendelian traits that presented with schizophrenia using the following search terms within the 'clinical features' subsection of entries on OMIM: schiz* OR mania OR manic.
However, in the actual analysis, we used the following search terms within the 'clinical features' subsection of entries on OMIM: schiz* OR psychotic OR psychosis. This change was because of an error in the search terms specified in pre-registration (duplication of the search terms used for bipolar disorder).
Also, we planned to search for Mendelian traits that presented with bipolar disorder using the following search terms within the 'clinical features' subsection of entries on OMIM: mania OR manic OR depress*.
However, in the actual analysis, we narrowed down the search terms for bipolar disorder by searching for the following terms within the 'clinical features' subsection of entries on OMIM: mania OR manic OR bipolar, to retain only entries that had a relevant match to bipolar disorder.

Outlier removal for schizophrenia phenotype
To remove outliers, we used the multivariate measure Robust Mahalanobis Distance 18,19 to exclude individuals who scored at or above the 99 th percentile of distance from the centroid of the data. This outlier removal procedure was not performed for schizophrenia, as the prevalence of affirmative responses was very low for the items f.20474.0.0 (strange force; N=343 responded 'yes' after outlier removal) and f.20468.0.0 (unjust plot; N=845 responded 'yes' after outlier removal), which prevented the calculation of a positive definite covariance matrix for factor analysis.

Filtering of OMIM results
In our preregistered analysis, we planned to retain all genes associated with each matched Mendelian disorder and grouped them as a "gene set." Competitive gene-set enrichment analysis would be performed (using the MAGMA software) to test whether they exhibit an enrichment of predicted deleterious variants associated with the latent traits we identified.
However, we noticed several matched Mendelian disorders on OMIM were not precisely linked to specific genes or loci but were instead associated with large segments of chromosomes due to the low specificity of linkage studies (these segments occasionally contained >100 genes). Since this would introduce noise into the gene set, as many genes would be unrelated to the matched disorder, we sought to filter these gene sets by length. We plotted the number of genes contained in each associated OMIM segment for each matched phenotype and decided to filter out segments that contained 5 or more genes, as the majority of segments contained <5 genes for matched phenotypes for all three phenotypes (shown in Figure S25).

GCTB estimation of heritability, selection and polygenicity
In our preregistered analysis, we planned to estimate the heritability, selection and polygenicity of each of the general latent traits using the GCTB software 20 . However, we found that the computational time and memory required to estimate these parameters for all the traits across all chromosomes was unfeasible, even using the nested model of GCTB with the openMPI parallelised version of the software. We therefore restricted the analysis to data from chromosome 1. The distribution of sigma (selection coefficient) and pi (polygenicity) are not expected to vary between chromosomes and are not additive, so estimates from a single chromosome are expected to be representative of estimates from across the genome.

Further exploratory analyses
Due to the suggestions from reviewers, we also calculated the Lambda 1000 and LDSC intercept using the LDSC software (reported in Table S13).

Supplementary methods
These methods are also available to view through the pre-registration plan available on the OSF (https://osf.io/w8jyu/).

Factor analysis for latent trait estimation
A literature search was conducted in order to select the range of questions in the Mental Health Questionnaire that were used for factor analysis, for each disorder. Further, it was used as a guide to determine the theoretical structure of factors and item loadings, and to determine whether to specify a higher-order general factor model. The details of this procedure were as follows: Studies were identified by searching PubMed ( https://pubmed.ncbi.nlm.nih.gov/) for papers relating to factor analysis for either "schizophrenia" or "bipolar disorder" (or "bipolar I disorder", "bipolar II disorder" or "cyclothymia"). Studies must have a sample size ≥300 and be published after 2004. Studies must have used factor analysis and must have used items directly relating to the mental disorder of interest (i.e. the items must relate to the disorder, rather than the disorder being used for external validation or comparison to another trait). Studies must also have reported a summary of factor loadings for individual items.
In order to select the range of questions in the Mental Health Questionnaire to be used for factor analysis, questions that did not ask about signs / symptoms in any of the diagnostic criteria (DSM-5 or ICD-10) were excluded. Questions must have been clinically relevant to the disorder of interest, rather than another outcome or trait that was compared to the psychopathology.
Questions that directly asked about signs / symptoms present on either or both the DSM and ICD criteria for the disorder were included as items. Furthermore, questions that asked about signs and symptoms that moderately-highly load onto the same factors as the direct questions do (>0.5) in the papers matching the criteria of the literature review were included.
For depression, the factor model matched that from Jermy et al. 21  Items related to symptoms during a potential episode of mania (in schizophrenia and bipolar disorder) were nested as follow-up items to questions that asked participants whether they had experienced a period of irritability or a period of feeling high, excited or hyper for 2 weeks. We coded negative responses to these questions as negative responses to the nested items.

Missing values
For items where individuals responded with "don't know" or "prefer not to answer", missing values were imputed using responses to answered questions. Since items were coded using ordinal scales, individuals' missing values on MHQ items were imputed by predicting them with an ordinal regression model that uses their responses on the other variables, with the regressionImp() function in the VIM package in R. This was done for items relating to each disorder separately.
Individuals who still had any missing values after this procedure (for example, because they had answered "don't know" or "prefer not to answer" to several questions) were removed from further analysis.

Outlier removal
To detect individuals who were likely outliers, we considered individuals' responses on all disorderrelated questions of the Mental Health Questionnaire, in a multivariate analysis. The generalised minimum covariance determinant method (GenMCD) was used to detect outliers in this ordinal dataset.
This approach is a variant of the minimum covariance determinant method, which randomly samples the data to find a subsample with the lowest covariance (i.e. a cluster that excludes a potential group of outliers), and uses this as the "centroid" 18,19 . GenMCD specifically uses Chi-squared distance and correspondence analysis to detect outliers in ordinal data (instead of Euclidean distance, which is used in Mahalanobis calculations) 22 .
The OuRS and GSVD packages in R was used to conduct GenMCD outlier detection, with the function cat.mcd(). 500 subsamples of the data were randomly generated to find a subsample with the smallest covariance; the size of the subsample was set to alpha=0.5 (the smallest subsample size allowable).
Individuals scores on this measure (Robust Mahalanobis Distance) that were at or above the 99 th percentile will be considered outliers and removed before further analysis. This was done for items relating to each disorder separately.

Exploratory factor analysis
Half the participants with phenotype data were randomly selected to conduct exploratory factor analysis, and the remaining half were used for confirmatory factor analysis subsequently. Kurtosis and skew were computed for all variables.
To assess the suitability of the data for factor analysis, Bartlett's test of sphericity and the Kaiser- In order to select the number of factors to retain, eigenvalues were estimated and plotted as a Scree plot to estimate the number of factors that should be retained in dimensionality reduction. Additionally, parallel analysis was conducted using the fa.parallel() function in the psych package, using the weighted least squares (wls) factoring method. Finally, the Minimum Average Partial (MAP) criterion and very simple structure (vss) criterion were calculated using the oblimin rotation method (an oblique rotation method, which allows for slight correlation between factors) and the minimum residual factoring method. The MAP criterion provides a lower bound of the number of factors that the data should be reduced to, while parallel analysis provides an upper bound.
For factor rotation, we used weighted least squares factoring with oblique geomin rotation, and used Thurstone's rules for complexity to select preferable models: The factors from the model solution was chosen for retention and extracted using the weighted least squares (wls) factoring method and geominQ rotation (an oblique rotation method), using the fa() function in the psych package.

Confirmatory factor analysis
The model solution from exploratory factor analysis was validated using the remaining half of the dataset. The Comparative Fit Index (CFI) and TLI should be > 0.9 but > 0.95 is preferable, RMSEA and Standardised Root Mean Square (SRMR) should be < 0.08 but < 0.05 is preferable. Factor scores were calculated using the empirical Bayes method using the lavPredict() function in the lavaan package in R.

Enrichment analysis
We grouped all matched genes (described below) as a gene set and performed competitive gene-set enrichment analysis using MAGMA to test for an enrichment of predicted deleterious variants in these genes.

Matched Mendelian disorders
To identify genes linked to Mendelian disorders exhibiting relevant clinical features, we conducted an advanced search on OMIM: tx_clinical_features:X, where X was depress* (for depression); schiz* OR psychosis OR psychotic (for schizophrenia); and bipolar OR mania OR manic (for bipolar disorder). We manually filtered search results to relevant phenotypes and restricted associated regions to those that contained <5 genes each ( Figure S24).

Matched common variant GWAS
We obtained gene-level summary data from the three largest GWAS of matched psychiatric illnesses: Wray et al. 23

Quality control
Out of the 503,328 individuals in the total UK Biobank sample, genetic data was collected from 488,377 participants. DNA was extracted from blood samples that were collected during recruitment and genotyped in 106 sequential batches across collection centres.
Genotyping was carried out using two similar genotyping arrays: 49,950 were genotyped using the Genotype data was aligned to Build GRCh37 and will be subjected to several steps of quality control: a maximum missing genotype filter of 0.02 for genetic variants and 0.02 for participants, exclusion of non-Caucasians, of related individuals, of gender/sex mismatches, and a Hardy-Weinberg Equilibrium threshold of 1x10 -8 . Participants' relatedness was estimated prior to release by the UK Biobank, using the KING software 29 . We removed participants with a relatedness >0.044 with others using a greedy algorithm (GreedyRelate, i.e. removing the child in a child-parent trio) 30 .
Individuals' phenotype scores will be adjusted for the first 20 principal components, which will be estimated using fastPCA with a SNP window size of 1500 SNPs, a SNP window shift of 150 SNPs, and a linkage disequilibrium threshold of r 2 > 0.02 for pruning. Additionally, their phenotype scores will be adjusted for their collection centre, genotype batch.

Imputation
The UK Biobank uses a multi-step process for imputation. After performing quality control on genotyped markers, pre-phasing is conducted using the SHAPEIT3 software, and imputes variants with the IMPUTE4 software (for fast imputation with the IMPUTE2 method). Two reference panels are used for imputation 31 . Firstly, the 1000 Genomes phase 3 reference panel 32 and the UK10K reference panels 33 were merged: these panels include SNPs, short indels and larger structural variants and totals 12,570 haplotypes. Secondly, the HRC reference panel has only SNPs but includes more haplotypes (64,976) and is more useful for imputing rare variants 34 . Thus, SNPs are imputed using both panels and HRCimputed SNPs are retained preferentially when imputations differed 31 .
Regarding the quality of rare variant imputation, Pistis et al. 35

Functional prediction of genetic variants
Variants were annotated through the ANNOVAR annotation software according to gene-level and functional-level annotation, with ensGene for gene-level annotation 37 and dbNSFP33a for functional The functional impact of variants were predicted using several annotation tools: dbscSNV 39 ,

Heritability estimation
Bayesian approaches have been developed to estimate SNP heritability while jointly estimating other parameters 44,45 . The GCTB software that uses this approach was developed as an alternative to GCTA 20 . This software has a method called BayesS, which estimates SNP heritability (h 2 ), the polygenicity of a trait (π), and also estimates the relationship between a SNPj's allele frequency (p) and SNP effect size (βj) as a parameter (S). This is conducted using a Bayesian mixture prior that is normally distributed with a mean of zero: GCTB produces similar point estimates of heritability to GREML and can also explicitly model polygenicity and the relationship between SNP effect size and MAF, while being more computationally efficient with large datasets. These qualities make it more suitable for research that aims to estimate the contribution of rare variants to the heritability of a trait 20 .
To deal with the large sample size in this study, we used the nested BayesS model for analysis in Heritability estimation will be conducted using first 20 principal components (calculated from genotype data) as numerical covariates. We will estimate 95% high-density intervals around the point estimates, using the HDIntervals package in R.

Enrichment testing
For each psychiatric trait, we searched the OMIM database for entries that mentioned matched phenotypes, manually filtered entries to retain only those that described the intended meaning of the word (e.g. psychological depression rather than 'depression' of a parameter). We remapped associated gene loci to build GRCh37 using the NCBI remap tool. 46 These MIM genes/loci are listed in Table S3 This resulted in 97 loci for depression, 40 loci for bipolar disorder, and 108 loci for schizophrenia.
None of the MIM loci overlapped with genes found in common variant GWAS for schizophrenia, bipolar disorder or depression.

OMIM matched phenotypes for depression
Search criteria: Gene Map Search -'tx_clinical_features:depression (Entries with: gene map locus; Retrieve: gene map)'

Depression phenotype
Participants' answers on 18 depression-related questions in the UK Biobank (7 questions from the Generalised Anxiety Disorder Assessment (GAD7), 2 questions relating to subjective well-being and 9 questions from the Patient Health Questionnaire (PHQ9; see supplementary materials) were used to create a general internalising score through hierarchical factor analysis. This was used to create a continuous phenotype for genetic association testing.  Figure S3 displays the proportion of responses that were considered missing values for each question ("don't know" or "prefer not to answer"), while figure 1 displays responses to questions that were not considered missing values.
Across all questions about depressive symptoms, the majority of participants responded that they had experienced the depressive symptoms "not at all" in the last two weeks. Participants responded that they experienced depressive symptoms with varying frequencies. The question asking whether participants experienced "recent feelings of tiredness or low energy" had the highest proportion of participants who responded in the affirmative (50%), while the question asking whether participants experienced "recent thoughts of suicide or self-harm" had the lowest proportion of participants who responded in the affirmative (4%).
NA responses were most common (2.3%) for the question asking participants whether they believed their own life was meaningful, and least common (0.15%) for the question asking participants whether they had experienced recent poor appetite or overeating.

Schizophrenia phenotype
Participants' answers on 12 schizophrenia-related questions in the UK Biobank (4 questions from the PHQ-9, 4 questions from the CIDI: Mania scale and 4 questions from the CIDI: Psychosis scale) were used to create a general schizophrenia factor score through hierarchical factor analysis. This was used to create a continuous phenotype for genetic association testing. Across all questions about schizophrenia symptoms, the majority of participants responded that they had experienced the schizophrenia symptoms "not at all" in the last two weeks, or that they had not experienced symptoms of hallucinations or delusions, or that they had not experienced a period of irritability or of feeling high, excited or hyper.
Participants responded that they experienced schizophrenia symptoms with varying frequencies. The question asking whether participants experienced "recent feelings of tiredness or low energy" had the highest proportion of participants who responded in the affirmative (50%), while the question asking whether participants experienced "a strange force trying to communicate with me" had the lowest proportion of participants who responded in the affirmative (1%).
NA responses were most common (3.54%) for questions nested in the period of irritability or of feeling high, excited or hyper (i.e. experiencing racing thoughts, feeling more restless than usual, feeling more confident than usual, feeling easily distracted), and least common (0.15%) for the question asking participants whether they had experienced recent trouble concentrating on things.

Bipolar disorder phenotype
Participants' answers on 15 bipolar disorder-related questions in the UK Biobank (7 questions from the CIDI: Depression scale and 8 questions from the CIDI: Mania scale) were used to create a general bipolar disorder factor score through hierarchical factor analysis. This was used to create a continuous phenotype for genetic association testing.

Raw responses to questions in the Mental Health Questionnaire are shown ("don't know" and "prefer not to answer" responses are not shown). Negative answers to either of the leading CIDI: Mania questions (experience of a period of irritability or a period of feeling high, excited or hyper) result in a coding of "no episode" for the following questions. The no episode coding is treated equivalently to a no response in further analysis. The same coding was applied to negative answers to either of the leading CIDI: Depression questions (period of feeling depressed for 2 weeks or period of lost interest for 2 weeks) which ask about symptoms experienced during that period. Colours, percentages and alignment are used only for illustrative purposes in this figure; they represent the presence of symptoms (orange for absence of symptoms; blue for presence of symptoms). Percentages labelled on the left and right side of the chart represent the total proportion of participants who responded with an
orange (e.g. "not at all") or blue response respectively.

Fig S6. Non-responses to bipolar disorder-related questions in the Mental Health Questionnaire of the UK Biobank
Raw NA responses to questions in the Mental Health Questionnaire ("don't know" and "prefer not to answer") are shown. 10 of these questions had the same coding for responses (504 coding in UK Biobank), and are displayed together as panel A; these 10 questions have an option for "prefer not to answer". 9 questions had different coding, and are described in panel B; these 9 questions have an option for "don't know" and an option for "prefer not to answer." Figure S6 and S7 display the responses to bipolar disorder-related questions in the Mental Health Questionnaire. These two figures visualise the responses to questions about symptoms experienced during a period of irritability or a period of feeling high, excited or hyper; and symptoms experienced during a period of lost interest or feeling depressed for 2 weeks. Figure 7 displays the proportion of responses that were considered missing values for each question ("don't know" or "prefer not to answer"), while figure 6 displays responses that were not considered missing values.
A majority of participants (55%) responded that they had experienced a period of feeling depressed for 2 weeks, though fewer (40%) responded that they had experienced a period of lost interest for 2 weeks. The majority of participants responded that they had not experienced a period of irritability or a period of feeling high, excited or hyper.
Participants responded that they experienced bipolar disorder symptoms with varying frequencies. The question asking whether participants experienced "felt tired or low on energy" had the highest proportion of participants who responded in the affirmative (44%), while the question asking whether participants agreed with the statement "I was more creative / had more ideas than usual" (during a period of irritability or a period of feeling high, excited or hyper) had the lowest proportion of participants who responded in the affirmative (3%).
NA responses were most common for questions nested in the period of feeling depressed or a loss of interest for 2 weeks (8.77% for experiencing a gain or loss in weight without trying during this period), and least common (0.15%) for the question asking whether they had experienced a period of feeling depressed for 2 weeks (0.26%).

Depression phenotype
Out of the total sample in the Mental Health Questionnaire (157,358 individuals), 8401 individuals responded "don't know" or "prefer not to answer" to one or more depression-related questions. For items where individuals responded with "don't know" or "prefer not to answer", an individual's missing values were imputed using responses to other questions they had answered, using the function regressionImp() in the VIM package in R.
Individuals who still had any missing values after this procedure (for example, because they had answered "don't know" or "prefer not to answer" to several questions and their values for a particular question could not be imputed) were removed from further analysis. This resulted in N=155,246 individuals with phenotype data, 6,289 of whom had their missing responses imputed. Figure S8 displays the number of missing responses to depression-related questions after missing values were imputed using this procedure.

Schizophrenia phenotype
Out of the total sample in the Mental Health Questionnaire (157,366 individuals), 10,938 individuals responded "don't know" or "prefer not to answer" to one or more schizophrenia-related questions. For items where individuals responded with "don't know" or "prefer not to answer", an individual's missing values were imputed using responses to other questions they had answered, using the function regressionImp() in the VIM package in R.
Individuals who still had any missing values after this procedure (for example, because they had answered "don't know" or "prefer not to answer" to several questions and their values for a particular question could not be imputed) were removed from further analysis. This resulted in N=148,681 individuals with phenotype data, 2,253 of whom had their missing responses imputed. Figure S9 displays the number of missing responses to schizophrenia-related questions after missing values were imputed using this procedure.

Bipolar disorder phenotype
Out of the total sample in the Mental Health Questionnaire (157,366 individuals), 41,493 individuals responded "don't know" or "prefer not to answer" to one or more bipolar disorder-related questions. For items where individuals responded with "don't know" or "prefer not to answer", an individual's missing values were imputed using responses to other questions they had answered, using the function regressionImp() in the VIM package in R.
Individuals who still had any missing values after this procedure (for example, because they had answered "don't know" or "prefer not to answer" to several questions and their values for a particular question could not be imputed) were removed from further analysis. This resulted in N=135,606 individuals with phenotype data, 19,733 of whom had their missing responses imputed. Figure S10 displays the number of missing responses to bipolar disorder-related questions after missing values were imputed using this procedure.

Fig S9. Non-responses to bipolar disorder-related questions in the Mental Health Questionnaire of the UK Biobank after imputation of missing values
NA responses to questions in the Mental Health Questionnaire ("don't know" and "prefer not to answer") are shown after missing values were imputed. 8 of these questions had the same coding for responses (504 coding in UK Biobank), and are displayed together as panel A; these 16 questions have an option for "prefer not to answer". 6 questions had different coding, and are described in panel B; these 6 questions have an option for "don't know" and an option for "prefer not to answer."

Depression phenotype
Individuals' scores were calculated for a multivariate outlier detection measure, Robust Mahalanobis Distance 22 . Those who scored at or above the 99 th percentile on this measure were considered outliers and were removed before further analysis. In the depression data, 1,553 individuals exceeded this threshold, which left N=153,693 individuals with phenotype data.

Schizophrenia phenotype
In a deviation from the pre-registered method, individuals were not removed if they were considered outliers. This was because two of the symptoms (experience of strange force trying to harm me and experience of unjust plot to harm me) had very low prevalence (1%), meaning that most individuals who responded with those symptoms would be removed during the outlier removal procedure, and the correlations between these two items could not be computed.

Bipolar disorder phenotype
Individuals' scores were calculated for a multivariate outlier detection measure, Robust Mahalanobis Distance 22 . Those who scored at or above the 99 th percentile on this measure were considered outliers and were removed before further analysis. In the bipolar-disorder data, 1,356 individuals exceeded this threshold, which left N=134,250 individuals with phenotype data.

Fig S11. Responses to bipolar disorder-related questions in the Mental Health Questionnaire of the UK Biobank after imputation of missing values and removal of outliers
Responses to questions in the Mental Health Questionnaire are shown ("don't know" and "prefer not to answer" responses are not shown) after missing values were imputed and outliers were removed.

Depression phenotype
Half the dataset (N = 76,846) was randomly selected to conduct ordinal exploratory factor analysis.
To assess the normality of the data, Mardia's test of multivariate normality was conducted on a random

The first 18 principal components are shown, in order of the components. The eigenvalues of components decrease as the component number increases. The point of inflection (after 3 components) suggests the number of components to be retained.
In post-hoc analysis, the 6-factor model solution was adjusted by comparing the Bayesian Information Criterion (BIC) between models with varying number of items that had loadings higher than 0.3 or 0.4.
The solution adjustment that was chosen was a 6-factor model solution adjusted for items that have ≥3 loadings above 0.3 or no single loading above 0.4 but retaining the items that were depression related in the PHQ-9.
The maximum Tucker-Lewis Index (TLI) and minimum Root Mean Square Error of Approximation (RMSEA) were found in a model that was reduced to 5 factors, where items with low loadings had been removed, thus removing a 6th factor. These 5 factors were therefore chosen for retention and extracted using the weighted least squares (wls) factoring method and geominQ rotation (an oblique rotation method), using the fa function in the psych package.
In this model solution, the TLI was 0.98, the RMSEA was 0.048, the BIC was 6591.66, the mean item complexity was 1.3, and the cumulative variance explained by the model was 0.72.

Schizophrenia phenotype
Half the dataset (N = 74,341) was randomly selected to conduct ordinal exploratory factor analysis.
To assess the factorability of the data, Bartlett's test of sphericity and the Kaiser-Meyer-Olkin (KMO) test were conducted. Bartlett's test of sphericity assesses whether the correlation matrix of items in the data were significantly different from an identity matrix (the null hypothesis of no underlying factor structure). The chi-squared statistic from Bartlett's test was 139,604.4 (p = 0, df = 66), indicating factorability for the dataset. Since Bartlett's chi-squared statistic is also a function of sample size, the KMO test statistic was evaluated as well. As the overall MSA from the KMO test was 0.77, this indicated factorability of the data.
In order to select the number of factors to retain, Eigenvalues were estimated and plotted as a Scree plot to estimate the number of factors that should be retained in dimensionality reduction. This In post-hoc analysis, the 3-factor model solution was adjusted by comparing the Bayesian Information Criterion (BIC) between models with varying number of items that had loadings higher than 0.3 or 0.4 or items that had moderate-high loadings on multiple factors. The solution adjustment that was chosen was a 3-factor model solution without the item ("I was more confident than usual"), as it had moderatehigh loadings on two factors.
These 3 factors were therefore chosen for retention and extracted using the weighted least squares (wls) factoring method and geominQ rotation (an oblique rotation method), using the fa function in the psych package.
In this model solution, the TLI was 0.934, the RMSEA was 0.093, the BIC was 15969.43, the mean item complexity was 1, and the cumulative variance explained by the model was 0.7.

Figure S15. Factor structure of the chosen model solution for schizophrenia-related questions
Item loadings and residuals are not depicted. Abbreviations for factor labels are as follows: Psy = psychotic, Ngt = negative, Ds2 = disorganised, Sch = general schizophrenia factor. Abbreviations for items are as follows, described with the factors that they load onto:

Bipolar disorder phenotype
Half the dataset (N = 67,125) was randomly selected to conduct ordinal exploratory factor analysis.
To assess the factorability of the data, Bartlett's test of sphericity and the Kaiser-Meyer-Olkin (KMO) test were conducted. Bartlett's test of sphericity assesses whether the correlation matrix of items in the data were significantly different from an identity matrix (the null hypothesis of no underlying factor structure). The chi-squared statistic from Bartlett's test was 658,159.4 (p = 0, df = 66), indicating factorability for the dataset. As the overall MSA from the KMO test was 0.93, this indicated factorability of the data.
In order to select the number of factors to retain, Eigenvalues were estimated and plotted as a Scree plot to estimate the number of factors that should be retained in dimensionality reduction. This

The point of inflection (after 3-4 components) suggests the number of components to be retained.
A 3-factor model solution showed the best fit to the data, and no post-hoc adjustments were justified according to Thurstone's rules.
These 3 factors were therefore chosen for retention and extracted using the weighted least squares (wls) factoring method and geominQ rotation (an oblique rotation method), using the fa function in the psych package.
In this model solution, the TLI was 0.941, the RMSEA was 0.108, the BIC was 45880.73, the mean item complexity was 1.1, and the cumulative variance explained by the model was 0.84.

Depression phenotype
The model solution from exploratory factor analysis was validated using the remaining half of the dataset (N = 76,847).
In the confirmatory factor analysis, the model fit statistics were as follows. The CFI was 0.999 and the TLI was 0.998. The RMSEA was 0.023 and the Standardised Root Mean Square Residual (SRMR) was 0.026. This indicated a good model fit.
A hierarchical general factor was specified to correlate with all of the 5 factors. In the hierarchical model, the omega total was 0.9, and the model fit statistics were as follows. The CFI was 0.992 and the TLI was 0.990. The RMSEA was 0.039 and the SRMR was 0.031. This indicated a good model fit.

Schizophrenia phenotype
The model solution from exploratory factor analysis was validated using the remaining half of the dataset (N = 74,340).
In the confirmatory factor analysis, the model fit statistics were as follows. The CFI was 0.996 and the TLI was 0.995. The RMSEA was 0.018 and the Standardised Root Mean Square Residual (SRMR) was 0.048. This indicated a good model fit.
A hierarchical general factor was specified to correlate with all of the 3 factors. In the hierarchical model, the omega total was 0.88, the explained common variance of the general factor was 0.41, and the model fit statistics were as follows. The CFI was 0.996 and the TLI was 0.995.
The RMSEA was 0.018 and the SRMR was 0.048. This indicated a good model fit.

Bipolar disorder phenotype
The model solution from exploratory factor analysis was validated using the remaining half of the dataset (N = 67,124).
In the confirmatory factor analysis, the model fit statistics were as follows. The CFI was 0.999 and the TLI was 0.999. The RMSEA was 0.024 and the Standardised Root Mean Square Residual (SRMR) was 0.045. This indicated a good model fit.
A hierarchical general factor was specified to correlate with all of the 3 factors. In the hierarchical model, the omega total was 0.93, the explained common variance of the general factor was 0.73, and the model fit statistics were as follows. The CFI was 0.999 and the TLI was 0.999.
The RMSEA was 0.023 and the SRMR was 0.044. This indicated a good model fit.
Individuals' scores on all the factors were calculated and plotted (shown below). Scores on the hierarchical factor (termed the "internalising factor") were used as the continuous phenotype in genetic association testing that followed.

Factor scoring
Depression phenotype Individuals' scores on all the factors were calculated and plotted (shown below). Scores on the hierarchical factor (termed the general schizophrenia factor) were used as the continuous phenotype in genetic association testing that followed. Individuals' scores on all the factors were calculated and plotted (shown below). Scores on the hierarchical factor (termed the general bipolar disorder factor) were used as the continuous phenotype in genetic association testing that followed.

Genomic inflation and QQ plots
To derive the number of cases in the continuous phenotypes (to calculate the lambda 1000), we calculated the number of participants with a phenotype score above one standard deviation of the mean.

Heritability estimation and genetic correlations
We used the reported effective sample size (Neff) in each manuscript, and for the latent factors we calculated these using the formula as mentioned in Mullins et al. 24 Neff = 4 * n_cases * n_controls / (n_cases + n_controls)